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ABSTRACT 

We introduce a probabilistic (Bayesian) method for producing catalogs from 
images of crowded stellar fields. The method is capable of inferring the number 
of sources N in the image and can also handle the challenges introduced by over- 
lapping sources. The luminosity function of the stars can also be inferred even 
when the precise luminosity of each star is uncertain. This is in contrast with 
standard techniques which produce a single catalog, potentially underestimating 
the uncertainties in any study of the stellar population and discarding informa- 
tion about sources at or below the detection limit. The method is implemented 
using advanced Markov Chain Monte Carlo (MCMC) techniques including Re- 
versible Jump and Nested Sampling. The computational feasibility of the method 
is demonstrated on simulated data where the luminosity function of the stars is 
a broken power-law. The parameters of the luminosity function can be recovered 
with moderate uncertainties. We compare the results obtained from our method 
with those obtained from the SExtractor software and find that the latter sig- 
nificantly underestimates the number of stars in the image and leads to incorrect 
inferences about the luminosity function of the stars. 

Subject headings: catalogs — methods: data analysis — methods: statistical - 
stars: luminosity function, mass function 
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1. Introduction 

Traditional practice in astronomy is to take images of the sky, detect or enumerate 
sources visible in those images, and create catalogs. These catalogs are then used to perform 
the fundamental astronomical measurements, which might include something like the three- 
dimensional structure of the Galaxy or the two-point correlation function of galaxies. Indeed, 
the process of catalog construction is so "baked in" to our ideas about what astronomy is, 
we sometimes forget that the catalog is not the fundamental data product of astronomy; 
catalogs are produced from imaging; their production involves many decisions and ideas 
that go way beyond the information provided to the telescope by the incident intensity field. 
In addition, catalogs are not usually the final goal of any imaging project or survey. Typically, 
they are produced in order to facilitate the scientific study of populations of objects (e.g. 
the initial mass function of a population of stars), or to provide a sky-search capability to 
the community who might be interested in only a small subset of obje cts. Standard tools 



for ge nerating catal ogs from astro nomical imaging inclu de SExtractor ( Bertin and Arnouts 



19961 ). DAOPHOT flStetsonlll987u . and SDSS PHOTO flLupton et al.ll200lh . 



Telescopes don't make catalogs (IHogg and Langll201ll ). they measure the intensity field. 



Viewed through the lens of probabilistic inference, the goals of astronomy are to take the 
information in the telescope-generated records of the intensity field and deliver it to the 
quantities of astronomical interest with as little loss as possible. Insertion of a catalog- 
generation step in the inference pipeline between the raw imaging and the final astrophysical 
analyses is typically very lossy; the hard decisions of catalog making destroy information. 
Probability theory suggests that it will be much less lossy to pass forward not a catalog but 
a probabilisitic description of all the catalogs that could be consistent with the imaging — a 
posterior probability in (enormously large) catalog space. This article represents an attempt 
at this ambitous goal in the specific situation where the only objects in the field are stars. 

Beyond these extremely general concerns, there are practical issues: Standard methods 
for constructing catalogs can have difficulty in some challenging situations. For example, 
when multiple sources overlap partially or completely, it can be difficult to determine how 
many sources are present, and how much flux belongs to each source. In principle, uncertainty 
should be taken into account. Instead of simply estimating the position and flux of each 
object in the image, we should fully describe the fact that sometimes we are not certain of 
the position and flux of each source. This uncertainty should then be propagated into any 
inferences about the stellar population. 

Essentially, the creation of a catalog is an attempt to answer the question, "Given the 
image we have obtained, what objects are present in the field and what are their properties?" 
This motivates a probabilistic (Bayesian) approach to making catalogs. The output of such 
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an approach would not be a single answer to this question (i.e. a single "point estimate" 
catalog), but rather a posterior probability distribution over the space of possible catalogs. 
Numerically, Bayesian Inference is often implemented using Markov Chain Monte Carlo 
(MCMC) algorithms in order to generate random samples from the posterior distribution. 

Sampling a posterior probability distribution for catalogs is likely to be challenging for a 
number of reasons. Firstly, the number N of objects in the image (and that should therefore 
be listed in the catalog) is itself unknown. Secondly, if N is large, then the parameter space 
of positions and properties (flux, size, etc) of the objects is also large. This can cause Markov 
Chain Monte Carlo (MCMC) algorithms difficulties - they may take a long time to converge 
to the target posterior distribution over the space of catalogs. Thirdly, this problem is subject 
to t he so-called label -switching problem that is commonly encountered in mixture modeling 
(e.g. IJasra et alll2005l ). Given any proposed catalog, another catalog that is equally plausible 



is the catalog obtained by shuffling the entries of the first catalog. This leads to a posterior 
distribution with N\ identical peaks in parameter space. This can lead to difficulties with 
certain (generally very effective) MCMC algorithms such as the affine-invariant stretch move 
( IGoodman fc Wearell201Cll : iForeman-Mackey et al.l 120121 ). 



Bayesian object detection (as this prob lem is sometimes called) ha s been implemente d 



both inside a nd outside o f astro nomy (e.g. lHarkness and Green! |2000| ; iFeroz et al.l 1201 ll ). 



However, the IFeroz et al.l ( 1201 ll ) approach makes the assumptio n of a known number of 
objec ts N. This assumption is required for the MultiNest sampler (IFeroz. Hobson. fe Bridges 
20091 ) to be applicable. Using the results from the known iV run, it is possible (under certain 
circumstances) to reconstruct what the results would have been if an unknown-iV model 
had been used. However, this will not work well in situations where there is significant 
confusion (i.e. two or more sources overlap). What is really required is a variable dimension 
model, where N G {0,1,2,...} is an unknown quantity to be inferred from the data. The 
computational implementati on of these models will require tools such as reversible jump 
Markov Chain Monte Carlo ( IGreenlll995l ). Othe r statistical methods have also been used to 
model crowded fields (e.g. maximum likelihood. Ilrwinlll985l ). However, maximum likelihood 
is not completely appropriate in flexible models because it may lead to overfitting. In this 
situation overfitting would result in more stars being added to the model to explain small 
positive fluctuations in the image which are a ctually due to noise. Various other techniques 
have also been pro posed in the literature (e.g. iMetchev fc Grindlayll2002l ; iMagain et al.ll2007l ; 
Zhang et alJboogh . 
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2. Bayesian Inference 

To quantitatively model uncertainties and transform noise in observed data into un- 
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(denoted collectively by 9) and we expect to obtain some data x. Our prior state of knowledge 
about the parameters is modelled by a prior probability distribution: 



P (d). 



(i) 



Note that this is a very concise notation (jHoggil2012i ) and should be read as "the probability 
distribution for 9" . We also model how the parameters give rise to the data, via a generative 
model. This is also known as a sampling distribution: 



p(x\9). 



(2) 



Despite the singular, the sampling distribution is actually a family of probability distributions 
over the space of possible data sets, one probability distribution for each possible value of 
9. Note that the choice of the sampling distribution is also an assumption about prior 
knowledge: It models prior information about the fact that the data x is connected to the 
parameters 9 in some way Without this prior knowledge, learning is impossible: there has 
to be some relationship between the parameters and the data, otherwise new data tells you 
about nothing but itself. 

When specific data x* have been obtained, our state of knowledge about 9 gets updated 
from the prior distribution to the posterior distribution via Bayes' rule: 



p(9\x 



x 



oc 



p{9)p{x\9)\ x= 
p(9)C(9;x) 



(3) 
(4) 



The term p(x\9)\ x=x * = £(6;x) is the likelihood function, which is the probability of obtain- 
ing; the actual data set X clS cL function of the parameters. In the case that the sampling 
distribution is a probability density, the likelihood is the probability density function evalu- 
ated at the observed data. This usually cause s no problems, although one should be aware of 
the Borel-Kolmogorov paradox (jjaynesl 120031 ) . As suggested by the above notation, the like- 
lihood function is obtained from the sampling distribution with the actual data substituted 
in and is therefore a function of the parameters only. 

To proceed with the model for inferring catalogs (stellar positions and fluxes) from image 
data, we must specify a definite hypothesis space and choices for the prior distribution and the 
sampling distribution. These choices are presented and discussed in Section [3j In Section 0] 
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we briefly discuss our MCMC implementation. Section [5] describes the tests we carried out 
on simulated data, and a comparison with SExtractor results is presented in Section [6] We 
conclude in Section [7J 



3. The Specific Model for Stellar Fields 

3.1. The Hypothesis Space 

The hypothesis space is the set of possible catalogs, or the set of possible answers to 
the question, "What objects are present in the field and what are their properties?" We 
shall assume that there are an unknown number of stars N in the field. Each star has an 
unknown position (x, y) in the plane of the sky, and an unknown flux /. We also describe 
the distribution of fluxes (commonly known as the luminosity function) of the stars by some 
parameters denoted collectively by 0. In summary, the unknown parameters are: 

e = {N,P,{x i ,y i }l 1 ,{f i }? =1 }. (5) 

We note that mode ls similar to t his have been implemented for general image modeling and 



deconvolution (e.g. ISkilling||l998l ) . however in this case it is more justified as we are actually 



searching for point fluxes. 



3.2. The Prior 

The prior probability distribution for the unknown parameters can be factorized using 
the product rule of probability theory. With a variety of independence assumptions, the 
prior can be factorized as: 

N 

p(e)=p(/3)p(N\P)l[ P (x i ,y i ) P (fi\P) (6) 

8=1 

Here, we have assumed that the luminosity function does not depend on position. Finally, 
the fluxes of the stars come independently from a common distribution. If we knew the 
luminosity function of the stars, then the location and flux of a particular star would not 
tell us anything about the location and flux of another star. Really, this is just a way of 
implementing exchangeability of the stars. 

For simplicity, we assume a uniform prior probability distribution for the position of each 
star. This creates a strong preference for catalogs where the stars are uniformly distributed 
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across the image. Thus, this model is appropriate for small patches of sky where the density 
of stars is approximately uniform. In other scenarios, such as images of stellar clusters, it 
is possible to parameterize the spatial distribution of the stars in a similar way to how we 
have parameterized the luminosity function. 



3.3. The Sampling Distribution 



The sampling distribution is a probabilistic model for the process that generates the 
data; it describes the probability distribution we would use to predict the data if we happened 
to know the true catalog. In our case, the data will be an m x n array of pixel intensities /: 



where the central position of each pixel is: 

{Xij, Yij}. 



(7) 



(8) 



The image is assumed to be a noisy version of the true underlying intensity field. Thus, we 
need a prescription for simulating an image {Uj} from a catalog 9: 



9 = {N,(3,{x l ,y t }l 1 ,{f l }l 1 }. 



(9) 



If we knew the true catalog, we could compute the "mock image" we would expect to see in 
the absence of noise. This mock image (defined at every point on the sky) is given by: 



N 



M(x,y) = ^2fiV(x - x u y - y t ) 



(10) 



where V is the pixel-convolved point spread function (PSF). Throughout this paper we will 
assume the pixel-convolved PSF is a weighted mixture of two concentric circular Gaussians 
with widths s\ and s 2 : 



w 



+ 



1 — w 

2tts1 



exp 



-23 ^ 



The pixellated observed image is assumed to be generated from the mock image (evaluated 
at the center of each pixel) plus noise: 



Iij — Ai (X \ j, Yi j) + €ij 



(12) 
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where the errors {tij} are independent and normally distributed. The variance of each pixel 
is determined by the brightness of the sky and the brightness of the mock image in that 
pixel: 



jV(0, al + r]I k 



(13) 



where gq is a constant noise level and rj is a coefficient allowing the noise to be higher in 
brighter regions of the image. This parameterisation has been used by iBrewer et al.1 (120111 ) 
and is an alternative to the common practice of producing a "variance map" from the image 
data that is then assumed to be known. 



3.4. The Prior Distribution 

The prior distribution for the number of stars iV is assigned to be uniform between 
and some maximum number iV max . The extent of the image is assumed to be from x = x m i n 
to x = x max = x min + x range and from y = y min to y = y max = y min + y range in arbitrary units, 
and the positions of the stars are assigned independent uniform priors: 

Xi ~ Uniform(x min - 0.05x range ,x max + 0.05x range ) (14) 
Ui ~ Uniform(?/ min - 0.05y range ,y max + 0.05y range ) (15) 

The stars are allowed to be slightly outside of the observed image because the PSF can 
scatter light from these stars into the image. Our model for the luminosity function is a 
broken power-law with four free parameters: 

/3 = {h 1 ,h 2 ,a 1 ,a 2 }. (16) 

where hi is a lower flux limit, h 2 is a break-point, a.\ is the slope of the distribution between 
h\ and h 2 , and a 2 is the slope of the distribution above h 2 . For details on the broken 
power-law model, see Appendix [A] The prior distribution on hi, h 2 , aci, and a 2 is assigned 
to be: 

hxhi ~ Uniform(ln(10" 3 ),ln(10 3 )) (17) 
ln/ia ~ Uniform(ln(fci),ln(fci) + 2.3) (18) 
ai ~ Uniform(l,5) (19) 
a 2 ~ Uniform(l,5). (20) 

These priors express vague prior knowledge about ai and a 2 in addition to vague prior 
knowledge about hi and h 2 apart from the fact that the flux units are not extreme and that 
h 2 should be no more than an order of magnitude greater than hi. 
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This simply-parameterized model for the luminosity function can be criticized on the 
basis that information from bright stars can be used to infer the parameters of the luminosity 
function which then still apply at lower flux levels. I n principle, this can be resolved by using 



a more flexible distribution (e.g. Kelly et al.ll2008l ) where each star's measured brightness 



affects the inference of the luminosity function locally but not globally. 
The priors for the PSF parameters and the noise parameters were: 

lnsi ~ Uniform(ln(0.3L),ln(30L)) (21) 

lns 2 ~ Uniform(ln(si),ln(si) + 2.3) (22) 

w ~ Uniform(0, 1) (23) 

In (T ~ Uniform(ln(10- 3 ),ln(10 3 )) (24) 

In 77 ~ Uniform(ln(10" 3 ),ln(10 3 )) (25) 

where L = (x max — x min )/n is the width of a pixel. 



4. MCMC Implementation 



The MCMC sampling was implemented using the Diffusive Nested Sampling f Brewer. 
Partay, fc Csa nyi 1201 ll ) method (hereafter DNS). DNS is a variant of the Nested Sampling 
( ISkillingi 120061 ) algorithm that uses Metropolis-Hastings updates, and is very generally appli- 
cable. The main difference between DNS and the standard Metropolis-Hastings algorithm 
is that the target distribution is modified. Rather than simply exploring the posterior dis- 
tribution over catalog space, DNS constructs an alternative target distribution which is a 
mixture of the prior distribution with more constrained versions of the prior distribution. 
The modified target distribution assists the sampling in several ways. Firstly, the target 
distribution shrinks at a constant rate with time during the in itial phase of the exploration. 
This is similar to the popular "simulated annealing" method (IKirkpatrick et al.lll983l ; iNeal 
20011 ) but with an optimal annealing schedule. Secondly, communication with the prior is 
maintained: once a catalog is found that fits the data, the catalog can "disintegrate" back 
to the prior distribution and re- fit, allowing different peaks in the parameter space to be 
explored (if they exist). This all happens naturally within the context of a valid MCMC 
sampler. The MCMC may also be run using the standard Metropolis algorithm targeting 
the posterior distribution. 
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5. Simulated Data 

In order to test our approach, we applied the method to two illustrative simulated 
images generated from the above model. The purpose of this experiment was to test the 
computational feasibility of the model, as well as to compare the inferences from the model 
with those from more standard techniques. 

The true parameter values for the two simulated data sets are listed in Table [TJ The 
broken power-law parameter values were chosen so that roughly half of the stars' fluxes were 
below and above the break-point respectively. Figure [8] in Appendix |A] also shows the true 
flux distribution used for the simulated images. Each of the images is 100 x 100 pixels in 
extent and covers a range from —1 to 1 in arbitrary units for both the x and y axes. The 
first image contains ~100 stars (including stars just outside of the image) and the second 
image contains ~1000 stars. The two simulated images themselves are shown in Figure [TJ 



Test Case 1 



Test Case 2 



s> 




Fig. 1. — The two simulated images used to test our methodology. Left: A field containing 
~100 stars. Right: A field containing ~1000 stars. 
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5.1. Test Case 1 



Test Case 1 was run with the DNS algorithm and usable results were obtained within 
about an hour on a modern desktop PC. The inferences on the parameters N, hi, h 2 , atj., 
and «2 are shown in Figure [2j The number of stars is correctly inferred, and the posterior 
distributions for the other parameters comfortably contain the true input values. Note that 
the uncertainty in h 2 , ai, and a 2 is quite large. This is because the broken power-law model 
(Figure [8]) does not change drastically in shape as the parameters are varied. Therefore, 
a large number of stars would be required to tightly constrain the parameters. A Nested 
Sampling approach assis ts on this problem because of the presence of a first-order phase 
transition (jSkilling||2006l ). 




96 104 
Number of Stars (JV) 




Fig. 2. — Inference about the parameters for Test Case 1. Note that there is considerable 
uncertainty (particularly about h 2 ), which occurs because the shape of the broken power- 
law does not depend strongly on the parameters. The true input values are plotted as filled 
squares. 

The PSF parameters {si,s 2 ,w} and the noise parameters {a Q ,rj} were also inferred 
accurately with small uncertainties. 



5.2. Test Case 2 

Test Case 2 is more challenging than Test Case 1 because the image contains more 
stars. This increases the size of the computational task in a number of ways: firstly, there 
will be more unknown parameters to infer, so the convergence of the MCMC algorithm will 
take longer. Secondly, the time taken to compute the predicted image from a catalog (in 
order to evaluate the likelihood) is longer because of the larger number of stars. Using 
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DNS, some samples from the posterior distribution can be obtained in about a day on a 
modern PC. However, this problem does not have a first-order phase transition like Test 
Case 1. Therefore, it is tractable to run standard Metropolis-Hastings MCMC targeting the 
posterior distribution. In fact, this can be more efficient because the degree of compression 
from the prior distribution to the posterior distribution is very large (~ e 3000 ) in this case. 



900 1000 1100 

Number of Stars (N) 



J 0.2 




0.3 0.4 

hi 



0.5 



Fig. 3. — Inference about the parameters for Test Case 2. Note that the parameters are still 
not very well constrained even with the larger number of stars. The true input values are 
plotted as filled squares. 

Some summary results from Test Case 2 are shown in Figures H] and [51 Figure H] shows 
nine possible catalogs sampled from the posterior distribution. Features that are common 
to these nine samples are plausible, and features that differ are uncertain. 

Each catalog in the posterior sample represents a scenario for the true underlying image 
that we would observe if we had a hypothetical noise-free, infinite resolution telescope. From 
these samples, we can construct the posterior expected true scene. This is shown in Figure 
along with the blurred (but still noise-free) version and the residuals. The residuals provide 
a check on the validity of the model, and the posterior expected true scene provides a useful 
visual guide to the uncertainties present in the catalogs. 

As with test case 1, the PSF parameters {si, s 2 , ti>} and the noise parameters {cr , 77} 
were also inferred accurately with small uncertainties. 



6. Comparison to SExtractor 



In the previous section we established that the inference of the catalogs from the data 
is computationally feasible and that the number of stars and the luminosity function can 
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Fig. 4. — Nine example catalogs sampled from the posterior distribution for Test Case 2. 
Features in common represent features with high probability, and differences between the 
catalogs represent conclusions that are uncertain. The area of each circle is proportional to 
the flux of the star. 

be inferred from the image data, albeit with moderate uncertainty. We now compare this 
approach to an alternative analysis that makes use of the standard tool SExtractor. To 
achieve this, we executed SExtractor on the two test images, for various values of the 
detection threshold. This results in a set of catalogs for each image, with more conservative 
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Fig. 5. — Summary images for test case 2. The upper left panel shows the posterior mean 
high-resolution scene. The upper right panel shows the posterior mean scene when observed 
at the resolution of the data, and the bottom panels show the model residuals. 



thresholds resulting in less stars detected as compared to more aggressive thresholds. The 
results for the number of stars in the images are shown in Figure The main result is that 
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SExtractor significantly underestimates the number of stars in the image for all possible 
values of the detection threshold. 



Test Case 1 Test Case 2 




Detection Threshold (a) Detection Threshold (a) 



Fig. 6. — The number of stars in the catalogs produced by SExtractor compared with 
the actual number of stars in the images. SExtractor finds only a small fraction of the 
stars actually present. This is to be expected in the high-cr case as SExtractor then only 
searches for stars above a certain flux, but at the low-er end the number of stars is still 
drastically underestimated. Note that the true numbers of stars shown in these plots are 
~82% of the values listed in Figure [1] because here we only count those stars with positions 
in [-1,1] x [-1,1]. 

For Test Case 2, we then took the catalog produced by SExtractor for the 0.5<r threshold 
and fitted a broken power-law model to the list of fluxes in the catalog. Such an approach 
might be used to study the luminosity function of a set of stars in a field. The results of 
this inference are shown in Figure [7J The inference spuriously rules out the true values of 
the parameters. This occurs because the uncertainty in the stellar fluxes is not propagated 
through to the inference of the luminosity function. 

7. Discussion and Conclusions 

In this paper we have developed and demonstrated a Bayesian approach to making 
catalogs from astronomical images in the case where the image contains only stars (or other 
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Fig. 7. — Inference of the luminosity function parameters using the 0.5a catalog from 
SExtractor. Note that the inference appears to rule out regions around the true input 
values. This occurs because the uncertainties in the existence and fluxes of the faint stars 
are not taken into account. The true input values are plotted as filled squares. 

point sources). The key idea is that instead of computing a single catalog, the method creates 
a posterior probability distribution on the space of possible catalogs that represents our state 
of knowledge about the presence and properties of objects in the image. When this is done, 
the uncertainties in the imaging are accurately propagated through to scientific conclusions, 
for example about the luminosity function of the stars. This approach was contrasted with a 
standard method of running SExtractor and then fitting a luminosity function to the fluxes 
in the catalog. This latter approach was shown to produce incorrect inferences about the 
luminosity function. 

We note that there are many limitations to the model presented in this paper. In 
principle, our model should be a model of the physical state of the universe, and not a simple 
model where the only stellar properties are a 2-D position and a flux. Another limitation is 
that we have not considered multi-epoch or m ulti-band imaging . In the former case, PSF 



variations and stellar motions may be relevant ( jLang et al.ll2009l ). and in the latter, a model 



for the spectral energy distributions of the stars will need to be considered. 



In practice, it may also be necessary to improve the model for the prior distribution 
of stellar positions and fluxes. One area where this is clearly needed is the application of 
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this approach to images of stellar clusters. The model would need to be revised to take into 
account the fact that we expect the stars' positions to be clustered together, whereas the 
current model implies a large prior probability for the stars being scattered evenly across the 
image. In this and other applications, the luminosity function would also require multiple 
components, for example consisting of stars that are associated with a cluster or a stream 
and those that are not. 

Throughout this paper, we have also assumed that the pixel-convolved PSF can be 
adequately modeled using simple components and that there are no PSF variations across 
the field. Relaxing this assumption provides a significant challenge for the future. 
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The broken power-law distribution is based on a straightforward extension to a simple 
power-law distribution (also known as a Pareto distribution, particularly in the statistics 
literature). The power-law distribution for a variable x (given a lower cutoff x = h and a 
slope a) is defined by: 



In contrast, the broken power-law distribution for a variable x is defined by a lower cutoff 
x = hi, two slopes {ai, a 2 } and a break point x = h 2 : 



8. Acknowledgements 



A. 



Broken Power-Law Distribution 




x < h 
x > h. 



(Al) 




x < hi 
hi < x < h 2 
x > h 2 . 



(A2) 



The free parameters of the broken power-law are: 



(3 = {hi,h 2 ,ai,a 2 }. 



(A3) 
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With normalising terms included, the proportionality becomes an equality: 

{0, x < hi 

Z^x- ai -\ h 1 <x<h 2 (A4) 
Z^ 1 !-"*- 1 , x > h 2 . 

Two conditions will be used to determine the normalizers Z\ and Z 2 . Firstly, the probability 
density function (PDF) should be continuous at x = h 2 : 

Z^h^- 1 = Z^V" 1 (A5) 
=> Z 2 = Z x hf~ a2 (A6) 

The second condition is that the total probability must be 1: 

rtl2 POO 

/ Z^x- a '- l dx+ I Z 2 x x~ a2 ~ x dx = 1 (A7) 

J hi J h.2 

Z^a^ 1 [h- ai - h~ ai ] + Z 2 l a 2 l h~ a2 = 1 (A8) 

zrVM^-vi+^^^rV* = i (A9) 

=> Z x = a^[h^ ai -h 2 ai ]+h 2 ai a 2 l . (A10) 

The cumulative distribution (CDF) is a useful property of a probability distribution and is 
given by the antiderivative of the PDF: 

{0, x < hi 

{Z x a x y x (h~ ai - x~ ai ) , h x <x<h 2 (All) 
1 - {Z 2 u 2 y x x~ a \ x>h 2 . 

The inverse of the CDF is also useful and is given by: 

p-i (u) { [K ai - uZ iai ] - 1/ai , < u < 1 - (Z 2 a 2 )- l h 2 a2 

1 ' \ [Z 2 a 2 (l - u)}- 1/a2 , 1 - (Z 2 a 2 y l h 2 a2 <u<l. 1 1 

An example of a broken power-law distribution is shown in Figure [HJ 

B. Proposal Distributions 



To implement Metropolis-Hastings moves for the space of possible catalogs, proposal 
distributions are required. See Table [2] for a list of proposal distributions used in this study. 
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Parameter 


Value (Test Case 1) 


Value (Test Case 2) 


N 


100 


1000 


hi 


0.3 


0.3 


hi 


0.6 


0.6 


«i 


1.1 


1.1 


ot 2 


2 


2 


TO 


0.05 


0.3 


V 








Sl 


0.02 


0.02 




0.1 


0.1 


w 


0.5 


0.5 



Table 1: True parameter values used to generate the simulated data. The main difference 
between the two test cases is the number of stars and the noise level, both of which are 
higher in test case 2. 



Parameter 


Proposal 


Notes 


N 


V -)■ V + S N 


Generate 5n new stars from p(x, y, f\(3). 


N 


N -> N -S N 


Remove Sn stars, chosen at random 


P 




Transform stars' fluxes correspondingly 


P 


P^0 + 5 


Fix stars' fluxes, put extra term in acceptance probability 




(xi, yi) ->■ (xu yC} + (S x , Sy) 


Can move > 1 star in a single step 


f 


f->f + s f 


Can move > 1 stars' fluxes in a single step 



Table 2: All 5 parameters are drawn from multi-scale distibutions such that the largest steps 
are of order the prior width, and the smallest steps are of order 10~ 6 times the prior width. 
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x lo gio x 

Fig. 8. — A broken power-law distribution. The parameter values for this particular PDF 
were {hi, hi, oli, 0:2} = {0.3,0.6,1.1,2}, i.e. the same parameter values used to make the 
simulated data. 
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